Sequential Transfer in Multi-armed Bandit with Finite Set of Models

نویسندگان

  • Mohammad Gheshlaghi Azar
  • Alessandro Lazaric
  • Emma Brunskill
چکیده

Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of sequential transfer in online learning, notably in the multi–armed bandit framework, where the objective is to minimize the total regret over a sequence of tasks by transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for estimating the possible tasks and derive regret bounds for it.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Modeling of Human Sequential Decision-Making on the Multi-Armed Bandit Problem

In this paper we investigate human exploration/exploitation behavior in sequential-decision making tasks. Previous studies have suggested that people are suboptimal at scheduling exploration, and heuristic decision strategies are better predictors of human choices than the optimal model. By incorporating more realistic assumptions about subject’s knowledge and limitations into models of belief ...

متن کامل

Bayesian and Approximate Bayesian Modeling of Human Sequential Decision-Making on the Multi-Armed Bandit Problem

In this paper we investigate human exploration/exploitation behavior in sequential-decision making tasks. Previous studies have suggested that people are suboptimal at scheduling exploration, and heuristic decision strategies are better predictors of human choices than the optimal model. By incorporating more realistic assumptions about subject’s knowledge and limitations into models of belief ...

متن کامل

Bayesian and Approximate Bayesian Modeling of Human Sequential Decision-Making on the Multi-Armed Bandit Problem

In this paper we investigate human exploration/exploitation behavior in a sequential-decision making task. Previous studies have suggested that people are suboptimal at scheduling exploration, and heuristic decision strategies are better predictors of human choices than the optimal model. By incorporating more realistic assumptions about subject’s knowledge and limitations into models of belief...

متن کامل

Linear Programming for Finite State Multi-Armed Bandit Problems

1. iBtrodactfMu An important sequential control problem with a tractable solution is the multi-armed bandit problem. It can be stated as follows. There are N independent projects, e.g., statistical populations (see Robbins 19S2), gambling machines (or bandits) etc.. The state of the pth of them at time t is denoted by x,it) and it belongs to a set of possible states S, which in this paper is as...

متن کامل

Finite dimensional algorithms for the hidden Markov model multi-armed bandit problem

The multi-arm bandit problem is widely used in scheduling of traffic in broadband networks, manufacturing systems and robotics. This paper presents a finite dimensional optimal solution to the multi-arm bandit problem for Hidden Markov Models. The key to solving any multi-arm bandit problem is to compute the Gittins index. In this paper a finite dimensional algorithm is presented which exactly ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013